Setting Things Up

Installing KrigR

First of all, we need to install KrigR. Since we are currently preparing the package for submission to CRAN, it is only available through the associated GitHub repository (https://github.com/ErikKusch/KrigR) for the time being.

Here, we use the devtools package within R to get access to the install_github() function. For this to run, we need to tell R to not stop the installation from GitHub as soon as a warning is produced. This would stop the installation process as early as one of the KrigR dependencies hits a warning as benign as “Package XYZ was built under R Version X.X.X” which can usually be ignored safely.

Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true")
devtools::install_github("https://github.com/ErikKusch/KrigR")
library(KrigR)

For the sake of this tutorial, we need some extra packages for visualisations. We check if they are already installed, subsequently install (if necessary) and load them with this user-defined function:

install.load.package <- function(x){
  if (!require(x, character.only = TRUE))
    install.packages(x, repos='http://cran.us.r-project.org')
  require(x, character.only = TRUE)
}
package_vec <- c("tidyr", # for turning rasters into ggplot-dataframes
                 "ggplot2", # for plotting
                 "viridis", # colour palettes
                 "cowplot", # gridding multiple plots
                 "ggmap" # obtaining google maps
                 )
sapply(package_vec, install.load.package)
##   tidyr ggplot2 viridis cowplot   ggmap 
##    TRUE    TRUE    TRUE    TRUE    TRUE

Before we can proceed, you need to open up an account at the CDS and create an API key by following this link: https://cds.climate.copernicus.eu/api-how-to. This is required so that you may issue download requests through the KrigR package. Once you have done so, please register the user ID and API Key as characters as seen below:

API_User <- "XXXXXX"
API_Key <- "XXXXXXX"

Setting Up Directories

The tutorial is designed to run completely from scratch. For this to work in a structured way, we create a folder/directory structure so that we got some nice compartments on our hard drives for where each separate Kriging process is run. We create the following directories:
- A data directory for all of our individual Kriging processes
- A shapefile directory (located within our data directory) for all of the shapefiles that we will use

Dir.Base <- getwd() # identifying the current directory
Dir.Data <- file.path(Dir.Base, "Data") # folder path for data
Dir.Shapes <- file.path(Dir.Data, "Shapes") # folder path for shapefiles
Dirs <- sapply(c(Dir.Data, Dir.Shapes), function(x) if(!dir.exists(x)) dir.create(x))

Downloading Shapefiles

Here, we download some of the most commonly used shapefiles (for our analyses using KrigR so far, at least). For repeat code-sourcing, we have implemented checks which only trigger the download of a given shapefile set if the data in question is not present in our Shapes directory yet.

This tutorial doesn’t make use of all the shapefiles we download here. We simply thought it prudent to show you what we have found to be useful and how to get your hands on the data in a reproducible way.

Land Cover

As we will see in this tutorial, masking out unnecessary areas from our analyses speeds up Kriging tremendously. Often, this will entail limiting data sets to terrestrial habitats. This land mask here does a terrific job at masking out non-land areas.

#### LAND MASK (for masking covariates according to land vs. sea)
if(!file.exists(file.path(Dir.Shapes, "LandMask.zip"))){ # if not downloaded yet
  download.file(paste0("https://www.naturalearthdata.com/http//www.naturalearthdata.com",
                       "/download/10m/physical/ne_10m_land.zip"), 
                destfile = paste(Dir.Shapes, "LandMask.zip", sep="/")) # download cultural vector
  unzip(paste(Dir.Shapes, "LandMask.zip", sep="/"), exdir = Dir.Shapes) # unzip data
}
LandMask <- readOGR(Dir.Shapes, "ne_10m_land", verbose = FALSE) # read
ggplot() + 
  geom_polygon(data = LandMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
  theme_bw() + labs(x = "Longitude", y = "Latitude")

States & Provinces

Political divisions are often what policy makers are after. Hence, we also download a shapefile for states and provinces across the Globe.

#### STATE MASK (for within-nation borders)
if(!file.exists(file.path(Dir.Shapes, "StateMask.zip"))){ # if not downloaded yet
  download.file(paste0("https://www.naturalearthdata.com/http//www.naturalearthdata.com/", 
                       "download/10m/cultural/ne_10m_admin_1_states_provinces.zip"), 
                destfile = paste(Dir.Shapes, "StateMask.zip", sep="/")) # download cultural vector
  unzip(paste(Dir.Shapes, "StateMask.zip", sep="/"), exdir = Dir.Shapes) # unzip data
}
StateMask <- readOGR(Dir.Shapes, "ne_10m_admin_1_states_provinces", verbose = FALSE) # read
ggplot() + 
  geom_polygon(data = StateMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
  theme_bw() + labs(x = "Longitude", y = "Latitude")

As you can see, this shapefile provides a set of rather small (geographically speaking) regions. We will use these for showing how Kriging works within this tutorial because Kriging runs in a timely manner for these regions.

Additionally, it is worth pointing out that www.naturalearthdata.com offers a large host of further shapefile data sets including (but not limited to):
- National borders
- Rivers and Lakes
- Urban areas
- Reefs and Bathymetry

We strongly recommend looking there early on in your projects for any shapefile needs you may have.

Ecoregions

When we’re not bound by political boundaries, ecoregions can often help to limit the spatial scope of our macroecological studies to manageable sizes (as far as Kriging effort is concerned).

#### ECOREGIONS (for ecoregional borders)
if(!file.exists(file.path(Dir.Shapes, "WWF_ecoregions"))){ # if not downloaded yet
  download.file(paste0("http://assets.worldwildlife.org/publications/15/files/",
                       "original/official_teow.zip"), 
                destfile = file.path(Dir.Shapes, "wwf_ecoregions.zip")) # download regions
  unzip(file.path(Dir.Shapes, "wwf_ecoregions.zip"), exdir = file.path(Dir.Shapes, "WWF_ecoregions")) # unzip data
}
EcoregionMask <- readOGR(file.path(Dir.Shapes, "WWF_ecoregions", "official", "wwf_terr_ecos.shp"), verbose = FALSE) # read
ggplot() + 
  geom_polygon(data = EcoregionMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
  theme_bw() + labs(x = "Longitude", y = "Latitude")

This particular data set offers the shapes of biomes and biogeographical realms across the Earth. As far as Kriging with the KrigR package is concerned, we heavily advise against Kriging using biogeographical realms without further consideration since these large regions lead to extreme processing requirements.

Plotting Functions

In order to easily visualise our Kriging procedure including (1) inputs, (2) covariates, and (3) outputs without repeating too much of the same code, we create a few plotting functions of our own here.

Don’t worry about understanding how these work off the bat here. Kriging and the package KrigR are what we want to demonstrate here - not visualisation strategies.

Raw Data

This function plots the raw data that we will krig and exports a single plot of all layers of the input raster:

Plot_Raw <- function(Raw, Shp = NULL, Dates){
  Raw_df <- as.data.frame(Raw, xy = TRUE) # turn raster into dataframe
  colnames(Raw_df)[c(-1,-2)] <- Dates # set colnames
  Raw_df <- gather(data = Raw_df, key = Values, value = "value", colnames(Raw_df)[c(-1,-2)]) #  make ggplot-ready
  Raw_plot <- ggplot() + # create a plot
    geom_raster(data = Raw_df , aes(x = x, y = y, fill = value)) + # plot the raw data
    facet_wrap(~Values) + # split raster layers up
    theme_bw() + labs(x = "Longitude", y = "Latitude") + # make plot more readable
    scale_fill_gradientn(name = "Air Temperature [K]", colours = inferno(100)) # add colour and legend
  if(!is.null(Shp)){ # if a shape has been designated
    Raw_plot <- Raw_plot + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
  }
  return(Raw_plot)} # export the plot

Covariates

This function plots the covariate data we will be using by showing each variable on both resolution levels side-by-side:

Plot_Covs <- function(Covs, Shp = NULL){
  Plots_ls <- as.list(rep(NA, nlayers(Covs[[1]])*2)) # create as many plots as there are covariates variables * 2
  Counter <- 1 # counter for list position
  for(Variable in 1:nlayers(Covs[[1]])){ # loop over all covariate variables
    Covs_Iter <- list(Covs[[1]][[Variable]], Covs[[2]][[Variable]]) # extract the data for this variable
    for(Plot in 1:2){ # loop over both resolutions for the current variable
      Cov_df <- as.data.frame(Covs_Iter[[Plot]], xy = TRUE) # turn raster into data frame
      colnames(Cov_df)[3] <- "Values" # assign a column name to the data column
      Plots_ls[[Counter]] <- ggplot() + # create plot
        geom_raster(data = Cov_df , aes(x = x, y = y, fill = Values)) + # plot the covariate data
        theme_bw() + labs(x = "Longitude", y = "Latitude") + # make plot more readable
        scale_fill_gradientn(name = names(Covs[[Plot]]), colours = cividis(100)) + # add colour and legend
        theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) # reduce margins (for fusing of plots)
        if(!is.null(Shp)){ # if a shape has been designated
          Plots_ls[[Counter]] <- Plots_ls[[Counter]] + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
        }
      Counter <- Counter + 1 # raise list counter
    } # end of resolution loop
  } # end of variable loop
  ggPlot <- plot_grid(plotlist = Plots_ls, ncol = 2, labels = "AUTO") # fuse the pplots into one big plot
  return(ggPlot)} # export the plot

Kriged Products

This function plots the Kriging outputs by splitting them up according to whether they are Kriging predictions or the uncertainties associated with them:

Plot_Krigs <- function(Krigs, Shp = NULL, Dates){
  Type_vec <- c("Prediction", "Standard Error") # these are the output typess of krigR
  Colours_ls <- list(inferno(100), rev(viridis(100))) # we want separate colours for the types
  Plots_ls <- as.list(NA, NA) # this list will be filled with the output plots
  for(Plot in 1:2){ # loop over both output types
    Krig_df <- as.data.frame(Krigs[[Plot]], xy = TRUE) # turn raster into dataframe
    colnames(Krig_df)[c(-1,-2)] <- paste(Type_vec[Plot], Dates) # set colnames
    Krig_df <- gather(data = Krig_df, key = Values, value = "value", colnames(Krig_df)[c(-1,-2)]) # make ggplot-ready
    Plots_ls[[Plot]] <- ggplot() + # create plot
      geom_raster(data = Krig_df , aes(x = x, y = y, fill = value)) + # plot the kriged data
      facet_wrap(~Values) + # split raster layers up
      theme_bw() + labs(x = "Longitude", y = "Latitude") + # make plot more readable
      scale_fill_gradientn(name = "Air Temperature [K]", colours = Colours_ls[[Plot]]) + # add colour and legend
      theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) # reduce margins (for fusing of plots)
    if(!is.null(Shp)){ # if a shape has been designated
          Plots_ls[[Plot]] <- Plots_ls[[Plot]] + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
        }
  } # end of type-loop
  ggPlot <- plot_grid(plotlist = Plots_ls, ncol = 1, labels = "AUTO") # fuse the pplots into one big plot
  return(ggPlot)} # export the plot

Kriging - the three steps

Using KrigR in this way has us use the functions download_ERA(), download_DEM(), and krigR(). Running these functions by themselves as opposed to executing the KrigR pipeline (which we cover later in this tutorial) gives you the most control and oversight of the KrigR workflow.

We will start with a simple Kriging process and subsequently make it more sophisticated.

By Extent

The most simple way in which you can run the functions of the KrigR package is by specifying a rectangular bounding box (i.e. an extent) to specify your study region(s). Let me show you how that changes the workflow and computational effort for the Saxony example. Here, we will run a small downscaling exercise of my birth-state of Saxony. It is a medium-sized state in the east of Germany and lends itself to some nice statistical downscaling since it presents us with mountainous regions along the south-eastern border and flatland areas towards the north-western edges.

Here’s the full area we will be obtaining and downscaling data for:

Extent <- extent(c(11.8, 15.1, 50.1, 51.7)) # roughly the extent of Saxony
GoogMap <- get_map(c(left = Extent[1], 
                     right = Extent[2], 
                     bottom = Extent[3], 
                     top = Extent[4]))
ggmap(GoogMap) + 
  theme_bw() + labs(x = "Longitude", y = "Latitude")

Here, you can see a map of the region with the mountains of the Erzgebirge along the southern border to the Czech Republic as well as the flat terrains around Leipzig in the north of Saxony.

Now, let’s create a separate directory within which we will store the raw ERA5-land data, GMTED2010 DEM covariate data, and Kriging outputs:

Dir.StateExt <- file.path(Dir.Data, "State_Extent")
dir.create(Dir.StateExt)

Climate Data

No we are ready to carry out our first download of climate reanalysis data through the KrigR package!

For this part of the tutorial, we download air temperature for a three-day interval around my birthday (03-01-1995) using the extent of my home-state of Saxony.

Notice that the downloading of ERA-family reanalysis data may take a short while to start as the download request gets queued with the CDS of the ECMWF before it is executed.

State_Raw <- download_ERA(
  Variable = '2m_temperature',
  DataSet = 'era5-land',
  DateStart = '1995-01-02',
  DateStop = '1995-01-04',
  TResolution = 'day',
  TStep = 1,
  Extent = Extent,
  Dir = Dir.StateExt,
  API_User = API_User,
  API_Key = API_Key)
Plot_Raw(State_Raw, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

Now, let’s look at the raster that was produced:

State_Raw
## class      : RasterBrick 
## dimensions : 17, 34, 578, 3  (nrow, ncol, ncell, nlayers)
## resolution : 0.1, 0.1000001  (x, y)
## extent     : 11.75, 15.15, 50.05, 51.75  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      :  index_1,  index_2,  index_3 
## min values : 267.8741, 266.8182, 262.5400 
## max values : 273.1816, 271.9754, 269.3639

As you can see, we obtained a RasterBrick object with 3 layers of data (one for each day we are interested in). Additionally, as could be gleamed from the plots above, it is apparent that my home-state got a lot cooler on the day after my birth (04/01/1995) when compared to the two preceding days. What to make of this, I leave up to you because I don’t think I like the interpretation myself if we were to assume causality here.

Keep in mind that every function within the KrigR package produces NetCDF (.nc) files in the specified directory (Dir argument in the function call) to allow for further manipulation outside of R if necessary (for example, using Panoply).

Covariates

Next, we use the download_DEM() function which comes with KrigR to obtain elevation data as our covariate of choice. This produces two rasters:

  1. A raster of training resolution which matches the input data in all attributes except for the data in each cell
  2. A raster of target resolution which matches the input data as closely as possible in all attributes except for the resolution (which is specified by the user) and the data in each cell

Both of these products are bundled into a list where the first element corresponds to the training resolution and the second element contains the target resolution covariate data. Here, we specify a target resolution of .02.

Covs_ls <- download_DEM(Train_ras = State_Raw,
                        Target_res = .02,
                        Dir = Dir.StateExt,
                        Keep_Temporary = TRUE)
Plot_Covs(Covs_ls)

Alternatively, you can specify a different raster which should be matched in all attributes by the raster at target resolution. We get to this again at a later point in this tutorial.

For now, let’s simply inspect our list of covariate rasters:

Covs_ls
## [[1]]
## class      : RasterLayer 
## dimensions : 17, 34, 578  (nrow, ncol, ncell)
## resolution : 0.1, 0.1000001  (x, y)
## extent     : 11.75, 15.15, 50.05, 51.75  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : DEM 
## values     : 75.02774, 879.5154  (min, max)
## 
## 
## [[2]]
## class      : RasterLayer 
## dimensions : 102, 204, 20808  (nrow, ncol, ncell)
## resolution : 0.01666667, 0.01666667  (x, y)
## extent     : 11.74986, 15.14986, 50.04986, 51.74986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : DEM 
## values     : 46.5, 1150.25  (min, max)

This set of covariates has 578 and 20808 cells containing values at training and target resolution, respectively.

Kriging

Now let’s krig these data:

KrigStart <- Sys.time()
State_Krig <- krigR(Data = State_Raw,
      Covariates_coarse = Covs_ls[[1]],
      Covariates_fine = Covs_ls[[2]],
      Keep_Temporary = FALSE,
      Cores = 1,
      FileName = "State_Ext.nc",
      Dir = Dir.StateExt)
KrigStop <- Sys.time()
Plot_Krigs(State_Krig, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))

There we go. All the data has been downscaled and we do have uncertainties recorded for all of our outputs. As you can see, the elevation patterns show up clearly in our kriged air temperature output. Furthermore, you can see that our certainty of Kriging predictions drops on the 04/01/1995 in comparison to the two preceding days. However, do keep in mind that a maximum standard error of 0.206, 0.215, 0.333 (for each layer of our output respectively) on a total range of data of 6.543, 6.289, 7.657 (again, for each layer in the output respectively) is evident of a downscaling result we can be confident in.

Now, what does the output actually look like?

State_Krig
## $Kriging_Output
## class      : RasterBrick 
## dimensions : 102, 204, 20808, 3  (nrow, ncol, ncell, nlayers)
## resolution : 0.01666667, 0.01666667  (x, y)
## extent     : 11.74986, 15.14986, 50.04986, 51.74986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : var1.pred.1, var1.pred.2, var1.pred.3 
## min values :    266.8200,    265.7288,    261.5947 
## max values :    273.3627,    272.0174,    269.2518 
## 
## 
## $Kriging_SE
## class      : RasterBrick 
## dimensions : 102, 204, 20808, 3  (nrow, ncol, ncell, nlayers)
## resolution : 0.01666667, 0.01666667  (x, y)
## extent     : 11.74986, 15.14986, 50.04986, 51.74986  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : var1.stdev.1, var1.stdev.2, var1.stdev.3 
## min values :    0.1641075,    0.1778261,    0.2574695 
## max values :    0.2059327,    0.2150844,    0.3331758

As output of the krigR() function, we obtain a list of downscaled data as the first element and downscaling standard errors as the second list element.

Finally, let’s see how long it took to carry out this shapefile-informed downscaling:

BaseTime <- KrigStop-KrigStart
BaseTime
## Time difference of 36.93623 secs

Train_res set to a raster –>